311 research outputs found
Incremental Learning Using a Grow-and-Prune Paradigm with Efficient Neural Networks
Deep neural networks (DNNs) have become a widely deployed model for numerous
machine learning applications. However, their fixed architecture, substantial
training cost, and significant model redundancy make it difficult to
efficiently update them to accommodate previously unseen data. To solve these
problems, we propose an incremental learning framework based on a
grow-and-prune neural network synthesis paradigm. When new data arrive, the
neural network first grows new connections based on the gradients to increase
the network capacity to accommodate new data. Then, the framework iteratively
prunes away connections based on the magnitude of weights to enhance network
compactness, and hence recover efficiency. Finally, the model rests at a
lightweight DNN that is both ready for inference and suitable for future
grow-and-prune updates. The proposed framework improves accuracy, shrinks
network size, and significantly reduces the additional training cost for
incoming data compared to conventional approaches, such as training from
scratch and network fine-tuning. For the LeNet-300-100 and LeNet-5 neural
network architectures derived for the MNIST dataset, the framework reduces
training cost by up to 64% (63%) and 67% (63%) compared to training from
scratch (network fine-tuning), respectively. For the ResNet-18 architecture
derived for the ImageNet dataset and DeepSpeech2 for the AN4 dataset, the
corresponding training cost reductions against training from scratch (network
fine-tunning) are 64% (60%) and 67% (62%), respectively. Our derived models
contain fewer network parameters but achieve higher accuracy relative to
conventional baselines
SCANN: Synthesis of Compact and Accurate Neural Networks
Deep neural networks (DNNs) have become the driving force behind recent
artificial intelligence (AI) research. An important problem with implementing a
neural network is the design of its architecture. Typically, such an
architecture is obtained manually by exploring its hyperparameter space and
kept fixed during training. This approach is time-consuming and inefficient.
Another issue is that modern neural networks often contain millions of
parameters, whereas many applications and devices require small inference
models. However, efforts to migrate DNNs to such devices typically entail a
significant loss of classification accuracy. To address these challenges, we
propose a two-step neural network synthesis methodology, called DR+SCANN, that
combines two complementary approaches to design compact and accurate DNNs. At
the core of our framework is the SCANN methodology that uses three basic
architecture-changing operations, namely connection growth, neuron growth, and
connection pruning, to synthesize feed-forward architectures with arbitrary
structure. SCANN encapsulates three synthesis methodologies that apply a
repeated grow-and-prune paradigm to three architectural starting points.
DR+SCANN combines the SCANN methodology with dataset dimensionality reduction
to alleviate the curse of dimensionality. We demonstrate the efficacy of SCANN
and DR+SCANN on various image and non-image datasets. We evaluate SCANN on
MNIST and ImageNet benchmarks. In addition, we also evaluate the efficacy of
using dimensionality reduction alongside SCANN (DR+SCANN) on nine small to
medium-size datasets. We also show that our synthesis methodology yields neural
networks that are much better at navigating the accuracy vs. energy efficiency
space. This would enable neural network-based inference even on
Internet-of-Things sensors.Comment: 13 pages, 8 figure
PinMe: Tracking a Smartphone User around the World
With the pervasive use of smartphones that sense, collect, and process
valuable information about the environment, ensuring location privacy has
become one of the most important concerns in the modern age. A few recent
research studies discuss the feasibility of processing data gathered by a
smartphone to locate the phone's owner, even when the user does not intend to
share his location information, e.g., when the Global Positioning System (GPS)
is off. Previous research efforts rely on at least one of the two following
fundamental requirements, which significantly limit the ability of the
adversary: (i) the attacker must accurately know either the user's initial
location or the set of routes through which the user travels and/or (ii) the
attacker must measure a set of features, e.g., the device's acceleration, for
potential routes in advance and construct a training dataset. In this paper, we
demonstrate that neither of the above-mentioned requirements is essential for
compromising the user's location privacy. We describe PinMe, a novel
user-location mechanism that exploits non-sensory/sensory data stored on the
smartphone, e.g., the environment's air pressure, along with publicly-available
auxiliary information, e.g., elevation maps, to estimate the user's location
when all location services, e.g., GPS, are turned off.Comment: This is the preprint version: the paper has been published in IEEE
Trans. Multi-Scale Computing Systems, DOI: 0.1109/TMSCS.2017.275146
SPRING: A Sparsity-Aware Reduced-Precision Monolithic 3D CNN Accelerator Architecture for Training and Inference
CNNs outperform traditional machine learning algorithms across a wide range
of applications. However, their computational complexity makes it necessary to
design efficient hardware accelerators. Most CNN accelerators focus on
exploring dataflow styles that exploit computational parallelism. However,
potential performance speedup from sparsity has not been adequately addressed.
The computation and memory footprint of CNNs can be significantly reduced if
sparsity is exploited in network evaluations. To take advantage of sparsity,
some accelerator designs explore sparsity encoding and evaluation on CNN
accelerators. However, sparsity encoding is just performed on activation or
weight and only in inference. It has been shown that activation and weight also
have high sparsity levels during training. Hence, sparsity-aware computation
should also be considered in training. To further improve performance and
energy efficiency, some accelerators evaluate CNNs with limited precision.
However, this is limited to the inference since reduced precision sacrifices
network accuracy if used in training. In addition, CNN evaluation is usually
memory-intensive, especially in training. In this paper, we propose SPRING, a
SParsity-aware Reduced-precision Monolithic 3D CNN accelerator for trainING and
inference. SPRING supports both CNN training and inference. It uses a binary
mask scheme to encode sparsities in activation and weight. It uses the
stochastic rounding algorithm to train CNNs with reduced precision without
accuracy loss. To alleviate the memory bottleneck in CNN evaluation, especially
in training, SPRING uses an efficient monolithic 3D NVM interface to increase
memory bandwidth. Compared to GTX 1080 Ti, SPRING achieves 15.6X, 4.2X and
66.0X improvements in performance, power reduction, and energy efficiency,
respectively, for CNN training, and 15.5X, 4.5X and 69.1X improvements for
inference
MOLECULAR DOCKING STUDIES FOR THE COMPARATIVE ANALYSIS OF DIFFERENT BIOMOLECULES TO TARGET HYPOXIA INDUCIBLE FACTOR-1α
Objective: Hypoxia plays a significant role in governing many vital signalling molecules in the central nervous system (CNS). Hypoxic exposure has also been depicted as a stimulus for oxidative stress, increase in lipid peroxidation, DNA damage, blood-brain dysfunction, impaired calcium (Ca2+) homoeostasis and agglomeration of oxidized biomolecules in neurons, which act as a novel signature in diverse neurodegenerative and oncogenic processes. On the contrary, the presence of abnormally impaired expression of HIF-1α under hypoxic insult could serve as an indication of the existence of tumors and neuronal dysfunction as well. For instance, under hypoxic stress, amyloid-β protein precursor (AβPP) cleavage is triggered due to the higher expression of HIF-1α and thus leads to synaptic loss. The objective of this research is to perform comparative studies of biomolecules in regulating HIF-1α activity based on in silico approaches that could establish a potential therapeutic window for the treatment of different abnormalities associated with impaired HIF-1α.Methods: We employed various in silico methods such as drug-likeness parameters namely Lipinski filter analysis, Muscle tool, SWISS-MODEL, active site prediction, Auto Dock 4.2.1 and LigPlot1.4.5for molecular docking studies.Results: 3D structure of HIF-1α was generated and Ramachandran plot obtained for quality assessment. RAMPAGE displayed 99.5% of residues in the most favoured regions. 0% residues in additionally allowed and 0.5% disallowed regions of the HIF-1α protein. Further, initial screenings of the molecules were done based on Lipinski's rule of five. Cast P server used to predict the ligand binding site suggests that this protein can be utilised as a potential drug target. Finally, we have found Naringenin to be most effective amongst three biomolecules in modulating HIF-1α based on minimum inhibition constant, Ki and highest negative free energy of binding with the maximum interacting surface area during docking studies.Conclusion: The present study outlines the novel potential of Biomolecules in regulating HIF-1α activity for the treatment of different abnormalities associated with impaired HIF-1α
CTRL: Clustering Training Losses for Label Error Detection
In supervised machine learning, use of correct labels is extremely important
to ensure high accuracy. Unfortunately, most datasets contain corrupted labels.
Machine learning models trained on such datasets do not generalize well. Thus,
detecting their label errors can significantly increase their efficacy. We
propose a novel framework, called CTRL (Clustering TRaining Losses for label
error detection), to detect label errors in multi-class datasets. It detects
label errors in two steps based on the observation that models learn clean and
noisy labels in different ways. First, we train a neural network using the
noisy training dataset and obtain the loss curve for each sample. Then, we
apply clustering algorithms to the training losses to group samples into two
categories: cleanly-labeled and noisily-labeled. After label error detection,
we remove samples with noisy labels and retrain the model. Our experimental
results demonstrate state-of-the-art error detection accuracy on both image
(CIFAR-10 and CIFAR-100) and tabular datasets under simulated noise. We also
use a theoretical analysis to provide insights into why CTRL performs so well
TransCODE: Co-design of Transformers and Accelerators for Efficient Training and Inference
Automated co-design of machine learning models and evaluation hardware is
critical for efficiently deploying such models at scale. Despite the
state-of-the-art performance of transformer models, they are not yet ready for
execution on resource-constrained hardware platforms. High memory requirements
and low parallelizability of the transformer architecture exacerbate this
problem. Recently-proposed accelerators attempt to optimize the throughput and
energy consumption of transformer models. However, such works are either
limited to a one-sided search of the model architecture or a restricted set of
off-the-shelf devices. Furthermore, previous works only accelerate model
inference and not training, which incurs substantially higher memory and
compute resources, making the problem even more challenging. To address these
limitations, this work proposes a dynamic training framework, called DynaProp,
that speeds up the training process and reduces memory consumption. DynaProp is
a low-overhead pruning method that prunes activations and gradients at runtime.
To effectively execute this method on hardware for a diverse set of transformer
architectures, we propose ELECTOR, a framework that simulates transformer
inference and training on a design space of accelerators. We use this simulator
in conjunction with the proposed co-design technique, called TransCODE, to
obtain the best-performing models with high accuracy on the given task and
minimize latency, energy consumption, and chip area. The obtained
transformer-accelerator pair achieves 0.3% higher accuracy than the
state-of-the-art pair while incurring 5.2 lower latency and 3.0
lower energy consumption
BIOMOLECULES MEDIATED TARGETING OF VASCULAR ENDOTHELIAL GROWTH FACTOR IN NEURONAL DYSFUNCTION: AN IN SILICO APPROACH
Objective: Neurodegenerative diseases are a debilitating age-related disorder manifested by memory loss, impaired motor activity, and loss of muscle tone due to the accumulation of toxic metabolites in the brain. Despite the knowledge of factors causing neurodegenerative disorders, it remains irreversible and incurable. Growing evidence have currently advocated the physiological and pathological contribution of hypoxia-induced vascular endothelial growth factor (VEGF) in neuronal loss. The objective of this research report highlights biomolecules mediated targeting of VEGF activity based on in silico approaches that could establish a potential therapeutic window for the treatment of different abnormalities associated with impaired VEGF.Methods: We employed various in silico methods such as drug-likeness parameters, namely, Lipinski filter analysis, Pock Drug tool for active site prediction, AUTODOCK 4.2.1, and LigPlot1.4.5 for molecular docking studiesResults: Three-dimensional structure of VEGF was generated and Ramachandran plot obtained for quality assessment. RAMPAGE displayed 99.5% of residues in the most favored regions, 0.5% residues in additionally allowed, and no residues in disallowed regions in VEGF, showing that stereochemical quality of protein structure is good. Further, initial screenings of the molecules were done based on Lipinski's rule of five. Finally, we have found Naringenin to be most effective among three biomolecules in modulating VEGF activity based on minimum inhibition constant, Ki, and highest negative free energy of binding with the maximum interacting surface area during docking studies.Conclusion: The present study outlines the novel potential of biomolecules in regulating VEGF activity for the treatment of different abnormalities associated with impaired VEGF
EdgeTran: Co-designing Transformers for Efficient Inference on Mobile Edge Platforms
Automated design of efficient transformer models has recently attracted
significant attention from industry and academia. However, most works only
focus on certain metrics while searching for the best-performing transformer
architecture. Furthermore, running traditional, complex, and large transformer
models on low-compute edge platforms is a challenging problem. In this work, we
propose a framework, called ProTran, to profile the hardware performance
measures for a design space of transformer architectures and a diverse set of
edge devices. We use this profiler in conjunction with the proposed co-design
technique to obtain the best-performing models that have high accuracy on the
given task and minimize latency, energy consumption, and peak power draw to
enable edge deployment. We refer to our framework for co-optimizing accuracy
and hardware performance measures as EdgeTran. It searches for the best
transformer model and edge device pair. Finally, we propose GPTran, a
multi-stage block-level grow-and-prune post-processing step that further
improves accuracy in a hardware-aware manner. The obtained transformer model is
2.8 smaller and has a 0.8% higher GLUE score than the baseline
(BERT-Base). Inference with it on the selected edge device enables 15.0% lower
latency, 10.0 lower energy, and 10.8 lower peak power draw
compared to an off-the-shelf GPU
DETECTING THE DEPTH OF THE CRACK BY LASER SPOT THERMOGRAPHY
Understanding the properties of the cracks can give detailed insight about the health and status of any structure. Some cracks may not be detrimental, others may cause the structure to collapse if not inspected, recognized and repaired ahead of time. This article is about detecting the depth of a crack using laser spot thermography. A 3D finite element analysis of a laser beam as a heat source and steel specimen with cracks of various depths is performed by using COMSOL Multiphysics 5.5. Then the relationship between the crack depth and the temperature differential index is studied by using a regression analysis. Finally, the equation obtained from the regression analysis is used to predict the depth of the arbitrary cracks. The predicted depth is verified with their actual depths. The results are accurate with the error ranging from +0.3 mm to -0.2 mm
- …